feat: plot vllm internal metrics to the wandb log#1567
Conversation
📝 WalkthroughWalkthroughThe changes extend vLLM metrics collection to include three new metrics (kv_cache_usage_perc, num_preemptions, generation_tokens), add a per-worker timeline visualization utility to the logger, create a helper function to log these metrics to wandb, and integrate the logging into the GRPO training loop when configured. Changes
Sequence DiagramsequenceDiagram
participant GRPO as GRPO Training
participant VllmWorker as VllmAsyncGenerationWorker
participant MetricsUtil as metrics_utils
participant Logger as Logger
participant Wandb as W&B Backend
GRPO->>VllmWorker: Collect vLLM metrics
VllmWorker->>VllmWorker: Track kv_cache_usage_perc,<br/>num_preemptions,<br/>generation_tokens
VllmWorker->>GRPO: get_vllm_logger_metrics()
GRPO->>MetricsUtil: log_vllm_metrics_to_wandb()
MetricsUtil->>Logger: log_plot_per_worker_timeline_metrics()
Logger->>Logger: Construct per-worker<br/>time-series plots
Logger->>Wandb: log_plot()
Wandb->>Wandb: Visualize metrics
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
841-847: vLLM metrics dict looks consistent; consider tightening return type annotationThe new metrics (kv_cache_usage_perc, num_preemptions, generation_tokens) are wired consistently with the existing per‑DP stats. Since the structure is now always
dict[str, dict[int, list[Any]]], you could updateget_vllm_logger_metrics’s return type fromdict[str, Any]to this more precise type to align withlog_vllm_metrics_to_wandband catch mismatches earlier in type-checking.Also applies to: 860-868
nemo_rl/algorithms/grpo.py (1)
40-45: Config‑gated vLLM plotting is wired correctly; consider hardening against empty/None metricsThe new
log_vllm_metrics_to_wandbimport and both call sites are correctly gated on:
policy.generation.vllm_cfg.enable_vllm_metrics_loggerlogger.wandb_enabledand reuse the same
vllm_metrics_logger_intervalthat drives collection. This keeps logging overhead opt‑in and consistent.To make this more future‑proof (especially in the async path where
vllm_logger_metricsstarts asNoneand is only set underNEED_REFIT), it would be safer iflog_vllm_metrics_to_wandbitself returned early when passed a falsy/empty metrics object. That way, these call sites remain simple and won’t break ifNEED_REFITor initialization order changes later.Also applies to: 1468-1478, 2378-2388
nemo_rl/algorithms/utils.py (1)
31-32: Guard vLLM metrics helper against empty/None input and clarify docstring scopeThe helper is correctly wired to the logger’s per‑worker timeline plotting and matches the structure produced by
get_vllm_logger_metrics. Two small improvements would make it more robust and clearer:
- Early return on empty/falsy metrics
If
vllm_logger_metricsis{}(or accidentallyNone), the current loop will still run and try membership checks. A cheap guard avoids that and protects future call‑site refactors:def log_vllm_metrics_to_wandb( @@ - """Log vLLM metrics to wandb. + """Log vLLM metrics as per-worker timelines via the shared Logger. @@ - """ - vllm_metrics_to_plot = [ + """ + if not vllm_logger_metrics: + return + + vllm_metrics_to_plot = [
- Docstring wording
The function uses the generic
Loggerinterface and logs to all configured backends (not just wandb). The updated first line above makes that intent clearer while still matching how it’s used (gated onwandb_enabledingrpo.py).Also applies to: 750-779
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
170-187: Metric collection for new vLLM fields is thread‑safe and consistent; update comment to match behaviorThe extended metric collection looks correct:
- All five series (inflight_batch_sizes, num_pending_samples, kv_cache_usage_perc, num_preemptions, generation_tokens) are appended under
_vllm_metrics_lockand read/cleared under the same lock, avoiding races.get_vllm_logger_metrics/clear_vllm_logger_metricsnow expose and reset the new fields in a way that matches how the driver aggregates and logs them.One small nit: the comment
# Lazy import inside thread target to avoid import overhead if disabledno longer reflects the code, since the import now lives at the top of
_start_vllm_metrics_loggerrather than inside the thread target. Consider rewording to something like “Lazy import inside the metrics logger setup to avoid module‑level overhead when disabled” to keep future readers aligned with the actual behavior.Also applies to: 191-196, 197-221, 243-250, 257-262
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
nemo_rl/algorithms/grpo.py(3 hunks)nemo_rl/algorithms/utils.py(2 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(2 hunks)nemo_rl/models/generation/vllm/vllm_worker_async.py(4 hunks)nemo_rl/utils/logger.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code
Files:
nemo_rl/algorithms/utils.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/algorithms/grpo.pynemo_rl/utils/logger.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes
Files:
nemo_rl/algorithms/utils.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/algorithms/grpo.pynemo_rl/utils/logger.py
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
nemo_rl/algorithms/utils.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/algorithms/grpo.pynemo_rl/utils/logger.py
**/*.{py,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)
Files:
nemo_rl/algorithms/utils.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/algorithms/grpo.pynemo_rl/utils/logger.py
🧬 Code graph analysis (2)
nemo_rl/algorithms/utils.py (1)
nemo_rl/utils/logger.py (2)
Logger(804-1102)log_plot_per_worker_timeline_metrics(938-997)
nemo_rl/algorithms/grpo.py (1)
nemo_rl/algorithms/utils.py (1)
log_vllm_metrics_to_wandb(750-779)
🪛 Ruff (0.14.5)
nemo_rl/utils/logger.py
962-964: Avoid specifying long messages outside the exception class
(TRY003)
986-986: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
|
Hi @terrykong, this is the PR that enables wandb plot what I shared today. |
terrykong
left a comment
There was a problem hiding this comment.
nice addition! left some comments
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> using wandb native api Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> not to use native api Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> fix legend Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> remove legend Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
91dac96 to
7e35f63
Compare
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
What does this PR do ?
kv_cache_usage_percandgeneration_tokens.Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.